Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation

Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto–optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi–objective optimizati...

متن کامل

Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation

This paper is about learning a continuous approximation of the Pareto frontier in Multi–Objective Markov Decision Problems (MOMDPs). We propose a policy–based approach that exploits gradient information to generate solutions close to the Pareto ones. Differently from previous policy–gradient multi–objective algorithms, where n optimization routines are used to have n solutions, our approach per...

متن کامل

Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material

متن کامل

Multi-objective reinforcement learning using sets of pareto dominating policies

Many real-world problems involve the optimization of multiple, possibly conflicting objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. MORL is the process of learning policies that optimize multiple criteria simultaneously. In...

متن کامل

Multi-Objective Reinforcement Learning

In multi-objective reinforcement learning (MORL) the agent is provided with multiple feedback signals when performing an action. These signals can be independent, complementary or conflicting. Hence, MORL is the process of learning policies that optimize multiple criteria simultaneously. In this abstract, we briefly describe our extensions to single-objective multi-armed bandits and reinforceme...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Artificial Intelligence Research

سال: 2016

ISSN: 1076-9757

DOI: 10.1613/jair.4961